Data Cleaning in SQL Server

Scroll down to the bottom of the page and select 'View Project Code' to view the SQL code on GitHub

About the Data

The dataset to be cleaned was regarding Nashville Housing Data. It's raw format was an Excel file. The more relevant data can be summarised into 3 main categories: Property Details (build year, address, value, acreage etc), Sales Details (date of purchase, price) and Owner Details (name, address).

Methodology

The steps for this project are outlined below, along with the SQL techniques utilised to carry out the data cleaning task :

1) Standardizing the Date Format
  • DDL
  • DML
2) Filling in Address Nulls
  • Self joins
3) Cleaning the Addresses
  • String functions - CHARINDEX(), SUBSTRING(), REPLACE(), PARSENAME(), LEN(), LTRIM()
4) Changing 'Y' and 'N' to 'Yes' and 'No' in the [Sold as Vacant] column
  • CASE expression
5) Removing Duplicate Rows
  • CTE's
  • Window functions
6) Delete Unused Columns
  • DDL